A Dynamic Language Model Based on Individual Word Domains
نویسندگان
چکیده
We present a new statistical language model based on a combination of individual word language models. Each word model is built from an individual corpus which is formed by extracting those subsets of the entire training corpus which contain that significant word. We also present a novel way of combining language models called the “union model”, based on a logical union of intersections, and use this to combine the language models obtained for the significant words from a cache. The initial results with the new model provide a 20% reduction in language model perplexity over the standard 3-gram approach.
منابع مشابه
Triggering individual word domains in n-gram language models
We present a new method of introducing domain knowledge into an n-gram language model. It is based on a combination of language models for individual word domains. Each word model is built from an individual corpus which is formed by extracting those subsets of the entire training corpus which contain that significant word. When testing, significant words are extracted from a cache and their mo...
متن کاملLearning the lexicon from raw texts for open-vocabulary Korean word recognition
In this paper, we propose a novel method of building a language model for open-vocabulary Korean word recognition. Due to the complex morphology of Korean, it is inappropriate to use lexicons based on the linguistic entities such as words and morphemes in openvocabulary domains. Instead, we build the lexicon by collecting variable length character sequences from the raw texts using a dynamic Ba...
متن کاملIndividual word language models and the frequency approach
We present a new method of introducing domain knowledge into an n-gram language model. It is based on a combination of language models for individual word domains. Each word model is built from an individual corpus which is formed by extracting those subsets of the entire training corpus which contain that significant word. When testing, significant words are extracted from a cache and their mo...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملAnalyzing Effect of Organizational Strategies on Organizational results using system dynamics based upon EFQM model
Abstract: In order to evaluate the relationship between Organizational Strategies and Organizational results, a comprehensive model is required, which should be able to capture all aspects of business excellence. The EFQM model is suitable tool to observe these factors. The EFQM model consists of two main domains: Enablers and Results. The first domain which includes processes and systems in g...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000